Pesquisa | Portal Regional da BVS

1.

Imputed genomes of historical horses provide insights into modern breeding.

Todd, Evelyn T; Fromentier, Aurore; Sutcliffe, Richard; Running Horse Collin, Yvette; Perdereau, Aude; Aury, Jean-Marc; Èche, Camille; Bouchez, Olivier; Donnadieu, Cécile; Wincker, Patrick; Kalbfleisch, Ted; Petersen, Jessica L; Orlando, Ludovic.

iScience ; 26(7): 107104, 2023 Jul 21.

Artigo em Inglês | MEDLINE | ID: mdl-37416458

RESUMO

Historical genomes can provide important insights into recent genomic changes in horses, especially the development of modern breeds. In this study, we characterized 8.7 million genomic variants from a panel of 430 horses from 73 breeds, including newly sequenced genomes from 20 Clydesdales and 10 Shire horses. We used this modern genomic variation to impute the genomes of four historically important horses, consisting of publicly available genomes from 2 Przewalski's horses, 1 Thoroughbred, and a newly sequenced Clydesdale. Using these historical genomes, we identified modern horses with higher genetic similarity to those in the past and unveiled increased inbreeding in recent times. We genotyped variants associated with appearance and behavior to uncover previously unknown characteristics of these important historical horses. Overall, we provide insights into the history of Thoroughbred and Clydesdale breeds and highlight genomic changes in the endangered Przewalski's horse following a century of captive breeding.

2.

Kiñit classification in Ethiopian chants, Azmaris and modern music: A new dataset and CNN benchmark.

Retta, Ephrem Afele; Sutcliffe, Richard; Almekhlafi, Eiad; Enku, Yosef Kefyalew; Alemu, Eyob; Gemechu, Tigist Demssice; Berwo, Michael Abebe; Mhamed, Mustafa; Feng, Jun.

PLoS One ; 18(4): e0284560, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37079543

RESUMO

In this paper, we create EMIR, the first-ever Music Information Retrieval dataset for Ethiopian music. EMIR is freely available for research purposes and contains 600 sample recordings of Orthodox Tewahedo chants, traditional Azmari songs and contemporary Ethiopian secular music. Each sample is classified by five expert judges into one of four well-known Ethiopian Kiñits, Tizita, Bati, Ambassel and Anchihoye. Each Kiñit uses its own pentatonic scale and also has its own stylistic characteristics. Thus, Kiñit classification needs to combine scale identification with genre recognition. After describing the dataset, we present the Ethio Kiñits Model (EKM), based on VGG, for classifying the EMIR clips. In Experiment 1, we investigated whether Filterbank, Mel-spectrogram, Chroma, or Mel-frequency Cepstral coefficient (MFCC) features work best for Kiñit classification using EKM. MFCC was found to be superior and was therefore adopted for Experiment 2, where the performance of EKM models using MFCC was compared using three different audio sample lengths. 3s length gave the best results. In Experiment 3, EKM and four existing models were compared on the EMIR dataset: AlexNet, ResNet50, VGG16 and LSTM. EKM was found to have the best accuracy (95.00%) as well as the fastest training time. However, the performance of VGG16 (93.00%) was found not to be significantly worse (P < 0.01). We hope this work will encourage others to explore Ethiopian music and to experiment with other models for Kiñit classification.

Assuntos

Música , Canto , Humanos , Benchmarking/classificação , Etiópia , Conjuntos de Dados como Assunto/classificação

3.

Extracting drug-drug interactions from no-blinding texts using key semantic sentences and GHM loss.

Chen, Jiacheng; Sun, Xia; Jin, Xin; Sutcliffe, Richard.

J Biomed Inform ; 135: 104192, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-36064114

RESUMO

The extraction of drug-drug interactions (DDIs) is an important task in the field of biomedical research, which can reduce unexpected health risks during patient treatment. Previous work indicates that methods using external drug information have a much higher performance than those methods not using it. However, the use of external drug information is time-consuming and resource-costly. In this work, we propose a novel method for extracting DDIs which does not use external drug information, but still achieves comparable performance. First, we no longer convert the drug name to standard tokens such as DRUG0, the method commonly used in previous research. Instead, full drug names with drug entity marking are input to BioBERT, allowing us to enhance the selected drug entity pair. Second, we adopt the Key Semantic Sentence approach to emphasize the words closely related to the DDI relation of the selected drug pair. After the above steps, the misclassification of similar instances which are created from the same sentence but corresponding to different pairs of drug entities can be significantly reduced. Then, we employ the Gradient Harmonizing Mechanism (GHM) loss to reduce the weight of mislabeled instances and easy-to-classify instances, both of which can lead to poor performance in DDI extraction. Overall, we demonstrate in this work that it is better not to use drug blinding with BioBERT, and show that GHM performs better than Cross-Entropy loss if the proportion of label noise is less than 30%. The proposed model achieves state-of-the-art results with an F1-score of 84.13% on the DDIExtraction 2013 corpus (a standard English DDI corpus), which fills the performance gap (4%) between methods that rely on and do not rely on external drug information.

Assuntos

Pesquisa Biomédica , Semântica , Humanos , Mineração de Dados/métodos , Interações Medicamentosas

4.

Dynamic Key-Value Memory Networks With Rich Features for Knowledge Tracing.

Sun, Xia; Zhao, Xu; Li, Bo; Ma, Yuan; Sutcliffe, Richard; Feng, Jun.

IEEE Trans Cybern ; 52(8): 8239-8245, 2022 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-33531331

RESUMO

Knowledge tracing is an important research topic in student modeling. The aim is to model a student's knowledge state by mining a large number of exercise records. The dynamic key-value memory network (DKVMN) proposed for processing knowledge tracing tasks is considered to be superior to other methods. However, through our research, we have noticed that the DKVMN model ignores both the students' behavior features collected by the intelligent tutoring system (ITS) and their learning abilities, which, together, can be used to help model a student's knowledge state. We believe that a student's learning ability always changes over time. Therefore, this article proposes a new exercise record representation method, which integrates the features of students' behavior with those of the learning ability, thereby improving the performance of knowledge tracing. Our experiments show that the proposed method can improve the prediction results of DKVMN.

Assuntos

Aprendizagem , Humanos

5.

Improving Arabic Sentiment Analysis Using CNN-Based Architectures and Text Preprocessing.

Mhamed, Mustafa; Sutcliffe, Richard; Sun, Xia; Feng, Jun; Almekhlafi, Eiad; Retta, Ephrem Afele.

Comput Intell Neurosci ; 2021: 5538791, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34545281

RESUMO

Sentiment analysis is an essential process which is important to many natural language applications. In this paper, we apply two models for Arabic sentiment analysis to the ASTD and ATDFS datasets, in both 2-class and multiclass forms. Model MC1 is a 2-layer CNN with global average pooling, followed by a dense layer. MC2 is a 2-layer CNN with max pooling, followed by a BiGRU and a dense layer. On the difficult ASTD 4-class task, we achieve 73.17%, compared to 65.58% reported by Attia et al., 2018. For the easier 2-class task, we achieve 90.06% with MC1 compared to 85.58% reported by Kwaik et al., 2019. We carry out experiments on various data splits, to match those used by other researchers. We also pay close attention to Arabic preprocessing and include novel steps not reported in other works. In an ablation study, we investigate the effect of two steps in particular, the processing of emoticons and the use of a custom stoplist. On the 4-class task, these can make a difference of up to 4.27% and 5.48%, respectively. On the 2-class task, the maximum improvements are 2.95% and 3.87%.

Assuntos

Idioma

6.

Drug-Drug Interaction Extraction via Recurrent Hybrid Convolutional Neural Networks with an Improved Focal Loss.

Sun, Xia; Dong, Ke; Ma, Long; Sutcliffe, Richard; He, Feijuan; Chen, Sushing; Feng, Jun.

Entropy (Basel) ; 21(1)2019 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-33266753

RESUMO

Drug-drug interactions (DDIs) may bring huge health risks and dangerous effects to a patient's body when taking two or more drugs at the same time or within a certain period of time. Therefore, the automatic extraction of unknown DDIs has great potential for the development of pharmaceutical agents and the safety of drug use. In this article, we propose a novel recurrent hybrid convolutional neural network (RHCNN) for DDI extraction from biomedical literature. In the embedding layer, the texts mentioning two entities are represented as a sequence of semantic embeddings and position embeddings. In particular, the complete semantic embedding is obtained by the information fusion between a word embedding and its contextual information which is learnt by recurrent structure. After that, the hybrid convolutional neural network is employed to learn the sentence-level features which consist of the local context features from consecutive words and the dependency features between separated words for DDI extraction. Lastly but most significantly, in order to make up for the defects of the traditional cross-entropy loss function when dealing with class imbalanced data, we apply an improved focal loss function to mitigate against this problem when using the DDIExtraction 2013 dataset. In our experiments, we achieve DDI automatic extraction with a micro F-score of 75.48% on the DDIExtraction 2013 dataset, outperforming the state-of-the-art approach by 2.49%.

7.

A New Phylogenetic Inference Based on Genetic Attribute Reduction for Morphological Data.

Feng, Jun; Liu, Zeyun; Feng, Hongwei; Sutcliffe, Richard F E; Liu, Jianni; Han, Jian.

Entropy (Basel) ; 21(3)2019 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-33267027

RESUMO

To address the instability of phylogenetic trees in morphological datasets caused by missing values, we present a phylogenetic inference method based on a concept decision tree (CDT) in conjunction with attribute reduction. First, a reliable initial phylogenetic seed tree is created using a few species with relatively complete morphological information by using biologists' prior knowledge or by applying existing tools such as MrBayes. Second, using a top-down data processing approach, we construct concept-sample templates by performing attribute reduction at each node in the initial phylogenetic seed tree. In this way, each node is turned into a decision point with multiple concept-sample templates, providing decision-making functions for grafting. Third, we apply a novel matching algorithm to evaluate the degree of similarity between the species' attributes and their concept-sample templates and to determine the location of the species in the initial phylogenetic seed tree. In this manner, the phylogenetic tree is established step by step. We apply our algorithm to several datasets and compare it with the maximum parsimony, maximum likelihood, and Bayesian inference methods using the two evaluation criteria of accuracy and stability. The experimental results indicate that as the proportion of missing data increases, the accuracy of the CDT method remains at 86.5%, outperforming all other methods and producing a reliable phylogenetic tree.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA